Pesquisa | Portal Regional da BVS

Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts.

Hake, Joel; Crowley, Miles; Coy, Allison; Shanks, Denton; Eoff, Aundria; Kirmer-Voss, Kalee; Dhanda, Gurpreet; Parente, Daniel J.

Ann Fam Med ; 22(2): 113-120, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38527823

RESUMO

PURPOSE: Worldwide clinical knowledge is expanding rapidly, but physicians have sparse time to review scientific literature. Large language models (eg, Chat Generative Pretrained Transformer [ChatGPT]), might help summarize and prioritize research articles to review. However, large language models sometimes "hallucinate" incorrect information. METHODS: We evaluated ChatGPT's ability to summarize 140 peer-reviewed abstracts from 14 journals. Physicians rated the quality, accuracy, and bias of the ChatGPT summaries. We also compared human ratings of relevance to various areas of medicine to ChatGPT relevance ratings. RESULTS: ChatGPT produced summaries that were 70% shorter (mean abstract length of 2,438 characters decreased to 739 characters). Summaries were nevertheless rated as high quality (median score 90, interquartile range [IQR] 87.0-92.5; scale 0-100), high accuracy (median 92.5, IQR 89.0-95.0), and low bias (median 0, IQR 0-7.5). Serious inaccuracies and hallucinations were uncommon. Classification of the relevance of entire journals to various fields of medicine closely mirrored physician classifications (nonlinear standard error of the regression [SER] 8.6 on a scale of 0-100). However, relevance classification for individual articles was much more modest (SER 22.3). CONCLUSIONS: Summaries generated by ChatGPT were 70% shorter than mean abstract length and were characterized by high quality, high accuracy, and low bias. Conversely, ChatGPT had modest ability to classify the relevance of articles to medical specialties. We suggest that ChatGPT can help family physicians accelerate review of the scientific literature and have developed software (pyJournalWatch) to support this application. Life-critical medical decisions should remain based on full, critical, and thoughtful evaluation of the full text of research articles in context with clinical guidelines.

Assuntos

Medicina , Humanos , Médicos de Família

Machine Learning Prediction of Urine Cultures in Primary Care.

Parente, Daniel; Shanks, Denton; Yedlinksy, Nicole; Hake, Joel; Dhanda, Gurpreet.

Ann Fam Med ; (21 Suppl 1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36972528

RESUMO

Context: Antibiotics for suspected urinary tract infection (UTI) is appropriate only when an infection is present. Urine culture is definitive but takes >1 day to result. A machine learning urine culture predictor was recently devised for Emergency Department (ED) patients but requires use of urine microscopy ("NeedMicro" predictor), which is not routinely available in primary care (PC). Objective: To adapt this predictor to use only features available in primary care and determine if predictive accuracy generalizes to the primary care setting. We call this the "NoMicro" predictor. Study Design and Analysis: Multicenter, retrospective, observational, cross-sectional analysis. Machine learning predictors were trained using extreme gradient boosting, artificial neural networks, and random forests. Models were trained on the ED dataset and were evaluated on both the ED dataset (internal validation) and the PC dataset (external validation). Setting: United States (US) academic medical centers emergency department and family medicine clinic. Population Studied: 80387 (ED, previously described) and 472 (PC, newly curated) US adults. Instrument: Physicians performed retrospective chart review. The primary outcome extracted was pathogenic urine culture growing ≥100,000 colony forming units. Predictor variables included age; gender; dipstick urinalysis nitrites, leukocytes, clarity, glucose, protein, and blood; dysuria; abdominal pain; and history of UTI. Outcome Measures: Predictor overall discriminative performance (receiver operating characteristic area under the curve, ROC-AUC), performance statistics (e.g., sensitivity, negative predictive value, etc.), and calibration. Results: The "NoMicro" model performs similarly to the "NeedMicro" model in internal validation on the ED dataset: NoMicro ROC-AUC 0.862 (95% CI: 0.856-0.869) vs. NeedMicro 0.877 (95% CI: 0.871-0.884). External validation on the primary care dataset also yielded high performance (NoMicro ROC-AUC 0.850 [95% CI: 0.808-0.889]), despite being trained on Emergency Department data. Simulation of a hypothetical, retrospective clinical trial suggests the NoMicro model could be used to avoid antibiotic overuse by safely withhold antibiotics in low-risk patients. Conclusions: The hypothesis that the NoMicro predictor generalizes to both PC and ED contexts is supported. Prospective trials to determine the real-world impact of using the NoMicro model to reduce antibiotic overuse are appropriate.

Assuntos

Urinálise , Infecções Urinárias , Adulto , Humanos , Infecções Urinárias/diagnóstico , Infecções Urinárias/tratamento farmacológico , Estudos Retrospectivos , Estudos Prospectivos , Estudos Transversais , Microscopia , Antibacterianos/uso terapêutico , Aprendizado de Máquina , Serviço Hospitalar de Emergência , Atenção Primária à Saúde , Urina

Adaptation and External Validation of Pathogenic Urine Culture Prediction in Primary Care Using Machine Learning.

Dhanda, Gurpreet; Asham, Mirna; Shanks, Denton; O'Malley, Nicole; Hake, Joel; Satyan, Megha Teeka; Yedlinsky, Nicole T; Parente, Daniel J.

Ann Fam Med ; 21(1): 11-18, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36690486

RESUMO

BACKGROUND: Urinary tract infection (UTI) symptoms are common in primary care, but antibiotics are appropriate only when an infection is present. Urine culture is the reference standard test for infection, but results take >1 day. A machine learning predictor of urine cultures showed high accuracy for an emergency department (ED) population but required urine microscopy features that are not routinely available in primary care (the NeedMicro classifier). METHODS: We redesigned a classifier (NoMicro) that does not depend on urine microscopy and retrospectively validated it internally (ED data set) and externally (on a newly curated primary care [PC] data set) using a multicenter approach including 80,387 (ED) and 472 (PC) adults. We constructed machine learning models using extreme gradient boosting (XGBoost), artificial neural networks, and random forests (RFs). The primary outcome was pathogenic urine culture growing ≥100,000 colony forming units. Predictor variables included age; gender; dipstick urinalysis nitrites, leukocytes, clarity, glucose, protein, and blood; dysuria; abdominal pain; and history of UTI. RESULTS: Removal of microscopy features did not severely compromise performance under internal validation: NoMicro/XGBoost receiver operating characteristic area under the curve (ROC-AUC) 0.86 (95% CI, 0.86-0.87) vs NeedMicro 0.88 (95% CI, 0.87-0.88). Excellent performance in external (PC) validation was also observed: NoMicro/RF ROC-AUC 0.85 (95% CI, 0.81-0.89). Retrospective simulation suggested that NoMicro/RF can be used to safely withhold antibiotics for low-risk patients, thereby avoiding antibiotic overuse. CONCLUSIONS: The NoMicro classifier appears appropriate for PC. Prospective trials to adjudicate the balance of benefits and harms of using the NoMicro classifier are appropriate.

Assuntos

Urinálise , Infecções Urinárias , Adulto , Humanos , Estudos Retrospectivos , Estudos Prospectivos , Microscopia , Infecções Urinárias/diagnóstico , Antibacterianos , Aprendizado de Máquina , Atenção Primária à Saúde/métodos

Birth Outcomes Related to Distance in Rural and Frontier Kansas.

Montelongo, Janet A; Hake, Joel; Liese, Bruce S; Kennedy, Michael.

Kans J Med ; 15: 319-324, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36196106

RESUMO

Introduction: Women from rural communities must travel greater distances to secure obstetrical care. This study sought the extent to which distance traveled by mothers for obstetrical services affects birth outcomes in rural and frontier counties of Kansas. Methods: Medical students invited women over the age of 18 to participate in a recall survey regarding their children under three years old. Participants were a sample of convenience, and the length of data collection was a month. A bivariate analysis was performed on the responses gathered regarding obstetrical measures as a function of self-reported distance traveled to the hospital of delivery. Results: Eighty-five women completed the survey, but only 76 satisfied all eligibility requirements. No statistical difference in birth outcomes were found between women who travel more than or less than 20 miles. However, when correlating data to that of the Kansas Hospital Association and the Kansas Department of Health and Environment, counties without birth facilities had a higher percentage of very low birth weights (< 1,500 grams) and more babies born at full-term when compared to counties that offer birth facilities. Babies born to mothers who reside in counties with obstetrical services were born at an earlier gestational age than those without birth facilities. Lastly, babies born into a family with income less than $50,000 weighed less and had a shorter gestational age than those from a more affluent household. Conclusions: The results revealed counterintuitive findings that deserve to be explored further by a study with greater statistical power.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA